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Abstract: The predictability of a time series is determined by the sensitivity to initial 
conditions of its data generating process. In this paper our goal is to characterize this 
sensitivity from a finite sample by assuming few hypotheses on the data generating model 
structure. In order to measure the distance between two trajectories induced by a same 
noisy chaotic dynamic from two close initial conditions, a symmetric KuUback-Leiber di- 
vergence measure is used. Our approach allows to take into account the dependence of 
the residual variance on initial conditions. We show it is linked to a Fisher information 
matrix and we investigated its expressions in the cases of covariance-stationary processes 
and ARCH{oo) processes. Moreover, we propose a consistent non-parametric estima- 
tor of this sensitivity matrix in the case of conditionally heteroscedastic autoregressive 
nonlinear processes. Various statistical hypotheses can so be tested as for instance the 
hypothesis that the data generating process is "almost" independently distributed at a 
given moment. Applications to simulated data and to the 5'&P500 index illustrate our 
findings. More particularly, we highlight a significant relationship between the sensitivity 
to initial conditions of the daily returns of the SSzPbOO and their volatihty. 

Keywords: Chaos theory; Sensitivity to initial conditions; Non-linear predictability; Time series. 



Introduction 

Stock price dynamics are difficult to approximate because of various factors influenc- 
ing the supply-demand interactions. These factors can be from a political, monetary, 
economic or psychological nature and are difficultly measurable in real-time. However, 
for an investor wishing to preserve his capital, modelling the price dynamics is a necessary 
task to quantify investment risks and to hedge his portfolio. In this regard, the existence 
of exploitable deterministic chaotic dynamics has become one of the key questions in the 
academic literature investigating nonlinear dynamics in financial and economic time se- 
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ries (see Brock (1986), Hsieh (1991), Peters (1994), Hommes (2001), Shintani and Linton 
(2003), Kyrtsou et al. (2004), Hommes and Manzan (2005)). 

The idea behind a chaotic data generating process is that future realizations of this 
process can be approximated by realizations following past realizations close to the current 
realizations. Forecasts of such a time scries can so be performed jiist by weighting some 
selected past observations. This is due to the fact that two trajectories induced by a 
same nonlinear chaotic dynamic will be close, until a certain time horizon, if they are 
generated from two close initial conditions. In contrast, when the data generating process 
is independent from initial conditions as in the case of independently distributed processes, 
further realisations cannot be determined from past values. Measuring the sensitivity of 
a time scries to initial conditions can so indicate if it can be predicted just by using its 
past values. 

In the literature, numerous nonlinear parametric models have been proposed to model 
economic and financial time series as the GARCH models (Engle (1982), BoUerslev 
(1986)), the threshold models (Tong (1983)) or the hidden Markov models (Mamon and 
Elliott (2007)). However, when we observe real- world time series, we do not know the 
structure of the data generating process. Nonparametric regression techniques represent 
an alternative to these nonlinear parametric models assuming fewer hypotheses on the 
model structure. When time series are generated from a deterministic chaotic system 
added by a stochastic measurement noise, these regression methods can be applied to 
estimate the underlying chaotic dynamic. In this framework, the chaotic component, also 
called the "skeleton" (Tong (1990)), models the sensitivity of the considered system to its 
initial conditions until a certain time horizon while the stochastic component introduces 
a part of unpredictability within data. More particularly, the stochastic perturbation can 
display heteroscedasticity, i.e. a time-varying conditional variance, which is a common 
feature in economic and financial time series (see Bollerslev, Chou and Kroner (1992)). 
In that case the conditional expectation of the process and the conditional variance of the 
dynamic noise can both depend on initial conditions. That is why we were interested to 
estimate the dependence on initial conditions of such a noisy chaotic dynamic by using 
nonparametric regression techniques. 

Several method already exist to measure the dependence of a time series on initial 
conditions. Two widely used methods arc the correlation dimension introduced by Grass- 
berger and Procaccia (1983) and the Lyapunov exponent (see Wolf et al. (1985), Rosen- 
stein, GoUins and Deluca (1993)). Such methods have initially been created for determin- 
istic data generating process which arc not perturbed by dynamic noises (see Damming 
and Mitschke (1993), Tanaka, Aihara and Taki (1998)). Nonetheless, when a stochastic 
noise is assumed, several studies have proposed to estimate the deterministic conditional 
expectation of the process and its derivatives by using some non-parametric regression 
tools (as local polynomial non-parametric regressions, neural networks regressions, etc.). 
It has to be remarked that they generally assume a constant residual variance. They next 
compute a correlation dimension (Kawaguchi and Yanagawa (2001), Kawaguchi et al. 
(2005)) or a Lyapunov exponent (McCaffrey et al. (1992), Nychka et al. (1992), Gengay 
(1996), Lu and Smith (1997), Shintani and Linton (2004)) from the estimated conditional 
expectation of the process. 

Anyway, these methods were originally developed for deterministic systems that is 
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why several studies question their estimation in a stochastic context. For instance, Schit- 
tenkopf, Dorffner and Dockner (2000) used neural networks regression to estimate the 
Lyapunov exponent of random dynamical systems and found difficulty interpretable re- 
sults. Dennis et al. (2003) developed examples of ecological population models in which a 
Lyapunov exponent estimated from raw data leads to conclusions opposite to those that 
can be deduced with a Lyapimov exponent estimated from the deterministic conditional 
expectation of the process. Kyrtsou and Serletis (2006) estimated a significantly negative 
Lyapunov exponent from daily returns of the USD/CAD exchange rate and remark that 
the presence of dynamic noise makes it impossible to distinguish between noisy chaos and 
pure randomness. Then it seems interesting to develop others methods to analyse the 
dependence of a noisy chaotic system on its initial values. 

For this purpose, in this paper, we propose the use of a symmetric KuUback-Leiber 
divergence measure apphed to two distributions having different initial conditions. This 
measure can be hnked to a Fisher information matrix (see Yao and Tong (1994)). The 
charm of this method rests on the fact that it allows to take into account the dependence 
of the residual variance on initial conditions. Schittenkopf, Dorffner and Dockner (2000) 
already studied this approach and gave expressions of such a Fisher information matrix 
when data are generated by stationary autoregressive models. Here the Fisher information 
matrix is estimated with local polynomial regressions what allows to take into account 
some non-stationary time series. A test based on this approach is next proposed to 
quantify the dependence on initial conditions of the data generating process. The finite 
sample properties of our approach are investigated through a simulation study and an 
application to the S^P 500. 

Our paper proceeds as follows. In Section 2, a measure of the divergence of two initially 
nearby trajectories is introduced. Wc show that it is linked to a Fisher information matrix. 
Its expression is given in case of conditionally heteroscedastic nonlinear autoregressive 
process and in the particular cases of covariance-stationary and ARCH{oo) processes. In 
Section 3, an estimation of the Fisher information matrix characterizing the dependence of 
the data generating process on initial conditions is presented. The asymptotic properties 
of this estimator are studied. In Section 4, a statistical test is proposed with the aim to 
test the dependence on initial conditions of a data generating process from a finite sample 
of its realizations. In Section 5, applications on simulated data and to the 5'&P500 index 
are performed. Finally, Section 6 corresponds to our conclusion. 

1 Measuring dependences on initial conditions in a 
noisy chaos context 

Let {xt)t(i[i,...,T] be an observed time series. According to the Taken's delay embedding 
theorem (Takens (1981)) and its generalisations (Sauer et al. (1991)), if these observations 
are generated from a dynamic of states following some regularity conditions (see Takens 
(1981) for most details), the dynamic of these observations is fully captured in the d- 
dimensional phase space defined by the delay vectors X(t, r, d) : 

Mt e [(d - 1)t + 1, . . . , T], X{t, T, d) = {Xt, Xt-r, Xt-^d-l)rf' (l-l) 
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where r G N is the time delay, (i G N* is a sufficiently large embedding dimension and X 
denotes the transposed vector of X. In the sequel of this paper, X{t, r, d) will be denoted 
by in order to simply the notations. In this framework, we thus have Xt+s = fi^t) 
for s g]0, +oo[ where / is a deterministic function respecting some regularity conditions. 
However, real observations often display a behaviour which seems generated by a mix 
between a totally deterministic process and a totally stochastic process. That is why we 
consider the following conditionally heteroscedastic nonlinear autoregressive process in 
this paper : 

Xt+s = fiX,)+gHXt)et (1.2) 

where : 

• / and g belong to regular spaces of functions, g being a strictly positive variance 
function. 

• is a random variable with an independent Gaussian distribution centered on 
and with a variance of 1. 

f{Xt) represents a component of the signal which is sensitive to initial conditions while 
g2[Xt)et represents a random component having its variance sensitive to the same initial 
conditions. In practice, the regular spaces of functions are generally specified in order to 
use a specific statistical method to estimate / and g from the observations. In the sequel 
of this paper, we will assume that / and g are four times differentiable on M*^. 



1.1 A divergence measure of two nearby trajectories 

As remarked by Yao and Tong (1994), a symmetric Kullback-Leiber divergence, also 
known as the J-divergence (see Jeffreys (1946)), can be used to quantify the divergence 



of two initially nearby trajectories in the framework (1.2). It equals to 



KL{t,s,5)= [ {p{xt+s\Xt + S)-p{xt+s\Xt))log(^^^^^^^^^)dxt+s (1.3) 
Jr ^ p{xt+s\Xt) J 



where p{xt+s\Xt) is the probability density function of Xt+s conditioned on the delay vector 
Xi, and where 5 G M'^ is a vector representing the difference between two initially nearby 



trajectories. More particularly, in our framework (1.2) 



(xt+, - f{X,)f 



( \V\ ^ ( \^t+s-J\ 



For a fixed 5, the more the system will depend on its initial conditions, the more KL{t, s, S) 
will be high. It has to be noted that KL{t, s, 6) is non-negative, symmetric in p{xt+s\Xt) 
and p{xt+s\Xt + 5) and equals to if and only if p{xt+s\Xt) = p{xt+s\Xt + S). A Taylor 



expansion of p{xt+s\Xt + 6) up to the first order in (1.3) allows us to write that : 

KL{t, s, 6) = 5'^'-I{Xt)5 + O5^o,(||5||') (1-4) 
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with I{X{t,T,d)) the Fisher information matrix defined by : 

^(^*) = / / ^ I ^, Vp{xt+s\Xt)Vp{xt+s\Xtf''dxt+s 

Jr p{Xt+s\Xt) 

where V denotes the gradient operator with respect to the coordinates of Xf. I{Xt) is 
a matrix of dimensions d x d which we shall also call a sensibility matrix. The next 
proposition extracted of Schittenkopf, Dorffner and Dockner (2000) gives the expressions 
of I{Xt) in our framework (1.2) : 

Proposition 1. Let us assume that f E C^(R.'^) and g E C^iW'-). Hence 

HXt) = ^V/(X,)V/(X,)^'- + :^^,Vg{X,)Vg{X,r (1-5) 

9{Xt) ^g[XtY 



A proof of this proposition is given in the Appendix. More particularly, in the case of 
a constant variance g{Xt) = o"^ > 0, we have 

/(X,) = lv/(X,)V/(X,)^^ (1.6) 

Let us consider various situations to interpret I{Xt) in this case : 

• If cr^ —7- +00 and < ||V/(Xf)||2 < +oo, the variance of et masks the sensitivity of 
the system to its initial conditions and the symmetric KuUback-Leiber divergence 
will thus tend toward 0. 

• If < (7^ < +00 and ||V/(Xt)||2 = 0, the symmetric KuUback-Leiber divergence 
equals to and the system is clearly not sensitive to its initial conditions since 
/(X,) = /(X, + 5) when 5^0,. 

• If (7^ —7- and < ||V/(X()||2 < +oo, the system will be totally depending on its 
initial conditions because et will tend to in probability. Consequently, KL[t, s,6) 
will tend to +oo. 

1.2 Dependence on initial conditions of covariance-stationary 
processes and ARCH{oo) processes 

In this part, we study the dependence on initial conditions of specific random processes 
widely used in econometrics by computing the previously introduced sensibility matrix. 

Covariance-stationary processes. The Wold decomposition (see Hamilton (1994), 
p. 109) ensures that any zero-mean covariance-stationary process can be represented as 

Xt = flt + It 

where : 



5 



* It — Sjlo 'f^i^t-i "^^ith ej a standard white noise, cr > a variance parameter, 
00 = 1 and YL%o 4>] < 

• //t is a linearly deterministic component of Xt which we denote by /it — chq + 

Eoo 
j=l ^j^t-j ■ 

By considering our previous notations, we have Xt = {xt-ijXt-2, • • • , Xt-j, ■ ■ ■), f{Xt) = fit 
and g2[Xt)et = '~ff In that case, f{Xt) is clearly dependent on Xt while g{Xt) — {'Jt/^t)^ 
is independent of Xt. So we have = aj and Vg{Xt) = what give 

I{Xt),, = ^a,aj V(^,j)eN*'2 

where I{Xt)ij denotes the coefficient on the i^^ row and j*^ column of the matrix I{Xt). 
The sensitivity to initial conditions is thus determined by the level of the ratio {'jt/^t)^ 
but also by the products of the parameters ccj. 

Remcirk 1.1. Any ARMA(p,q) processes of the form 

p 1 

^iL')^t = ao + (1 + XI (t>3L^)cr^t 
i=i i=i 

where L is the lag operator, is a standard white noise and where (ai) and are real 
parameters, has a p x p sensibility matrix equals to 

H^th = , v^* ^2 Q''Q'J ^(^'•?') e [1, ■ ■ ■ 

In the specific case of an autoregressive process of order p, we have 

I{Xt)ij = ^a,aj e [1, . . . 

what means that the sensitivity to initial conditions of any autoregressive process is un- 
changing over time. 

ARCH(oo) processes. Now, we consider an ARCH(oo) process which can be noted 
by 

where is a standard white noise and 

oo 

i=i 

where cuj > and YlJLi'-^j < +oo. Let us recall that ARCH(m) and GARCH(p, g) 
processes can be considered as particular cases of such a process (see Hamilton (1994), 
p. 665). Here, we have f{Xt) — and g{Xt) — erf what gives = 2ujXt-j. Hence, 
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In that case, the dependence on initial conditions is function of the parameters ooj but 
also of the past values of the time series. This dependence is thus time- varying similarly 
to a moving average process and contrary to an autoregressive process. 



When we work with real time series, a major issue consists of estimating I{Xt) without 
knowledge of the data generating process. With this purpose, we propose, in the rest 
of this paper, a consistent estimator of the sensibility matrix displayed in Proposition 
[T] by using local polynomial non-parametric regressions. This estimator will allow to 
measure the dependence on initial conditions of any observed time series having a dynamic 



respecting our framework displayed in 1.2 



2 Local estimation of the sensibility matrix 

The Fisher information matrix displayed in Proposition [l] depends on V/(XJ, Wg{Xt) 
and g{Xt). In order to estimate these quantities, we propose to use local polynomial 
non-parametric regression which is a widely used method displaying various advantages 
(see Fan and Gijbels (1996) for most details). 



2.1 Estimation by local polynomial regression 

This method begins with the following two steps : 

• Phase space reconstruction of the observed time series Xt- This steps con- 
sists of estimating the time delay r and the minimal embedding dimension dmin- 
Some references can be made to Fraser and Swinney (1986) or Moon, Rajagopalan 
and Lall (1995) concerning the estimation of r. Several algorithms are available for 
the estimation of dmin (see for instance Fraser and Swinney (1986), Kennel, Brown 
and Abarbanel (1992), Cao (1997), Kantz and Shreiber (2003)). Popular methods 
are the False Nearest neighbors method (Fraser and Swinney (1986)) or the Cao 
method (Cao (1997)). When r and dmin are adequately chosen, the delay vectors 



can be reconstructed as in (1.1) by taking d> d. 



Searching the k nearest neighbors of each delay vector Xt by using an Eu- 
clidean distance. If you compare each delay vector to each others, this method needs 
O(T^) operations. The number of operations can be reduced to 0(Tlog(T)) if you 
use the k-d. tree method (see Bentley (1975), Friedman, Bentley and Finkel (1977)). 
The C-f-|- library ANN allows to use such algorithms (see Arya et al. (1998)). In 
the sequel, we will write ti for the instant of the i^'^ nearest neighbor of Xt. 



Estimation of f{Xt) and V/(Xj). Let Xt- be in a neighborhood of Xt. f{Xt-) can 
thus be approximated by the Taylor series expansion up to the second order given by : 

fiXtJ ^ f{Xt) + {Xt^ - XtrVfiXt) + liXt^ - XtrV'f{Xt){Xt^ - Xt) (2.1) 
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where V^/(Xt) is the Hessian matrix of / with respect to the coordinates of Xt. The local 
non-parametric polynomial regression is based on this approach and consists of minimising 
the locally weighted sum of squared residuals : 



Bf^Hit) = argmin (P,(t) - D(t)B)'^'K^(t) (P,(t) - D(t)B) (2.2) 

where : 

A {Xt,-XtY^ ^ec\i{{Xt,-Xt){Xt,-Xtf^Y^\ 

• D(t) = : : : 

\l {X,,-X,r vech((X,,-X,)(X,,-X,)^0^7 
where vech(yl) denotes a vector containing the columns on and below the diagonal 
of a matrix A. represents the k nearest delay vectors of Xt in the sense 

of an Euclidean distance. 

• Ps(t) = (a^ti+s, • • • ,Xt^j^s) contains the realizations following the k nearest neigh- 
bors of the delay vector Xt. 

• K/f(t) is a weighing matrix such that 

'KH{t) = {KH{Xt,-Xt) ... KH{Xt,-Xt)f''lu (2.3) 

where Kh{X) = ^^^^^-^ K^H^^X) with K a kernel function and 1^ is an identity 
matrix of dimension k x k. For simplicity we consider K spherically symmetric and 
H = hid where /i G M in the sequel of this paper. A common choice for K is the 

d 11-^11^ 

standard normal density function K{X) = (27r)~2e 2^. 



The first derivative of (2.2) with respect to B allows to find that 



Bf^t) = {B{tf^KH{t)-D{t))-'-D{tf^KH{t)Ps{t) (2.4) 



where Bf^nit) is an estimation of Bf{t) = {f{Xt) VfiXtf'' (vech(L))^'^)^" with = 
V'^f{Xt)ij if i ^ j and L^- = \V'^f{Xt)ij if i = j. The 1** coordinate of B/_//(t) corre- 
sponds to an estimation of f{Xt), denoted fhi^t) in the case H = hid, while the 2 to 
d + 1 coordinates correspond to an estimation of V/(Xj) denoted V/^(Xt). 

Estimation of g{Xt) and Vg{Xt). Now let us denote the residual vector 6{t) = Ps(t) — 
D{t)Bf^H(t) and S'^{t) the vector containing the squares of the coordinates of 6(t). 

With the aim of estimating g{Xt) and Vg{Xt), we use the following local non-parametric 
estimator based on the residuals of the local non-parametric estimation of Bf{t) : 

Bs.,hM = mf'^H,m{t))-'Bitf^KH,{t)P{t) (2.5) 

Similarly to Bf^nit), the 1'^* coordinate of Bs2^H2{'t) corresponds to an estimation of 
E[(a;4+s — fh{Xt)Y] denoted by 'gh2{Xt) in the case H2 = h2ld, while the 2 to d + 1 
coordinates correspond to an estimation of VE[(a;t+s — fh{Xt)Y] denoted by Vgf^^{Xt). 
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Remark 2.1. It has to be noted that the matrices D(t)'^''K/^(t)D(t) and D(t)^''KH2(t)D(t) 
need to be invertible in (2.4) and (2.5). When two nearest neighbours are close to each 
other, their respective columns in these matrices are also close and these one will thus 
have determinants close to 0. To by-pass this numerical problem, one of both close neigh- 
bours can be eliminated. An other method consists of using a regression on principal 

1 

components by transforming the columns of the matrix K|^(t)D(t) into a matrix with 
orthonormal columns (see Jolliffe (2002) for most details). 



Finally, we propose an estimator of the sensibility matrix displayed in Proposition [T| 
by using (|2.4l) and (|2.5l) : 



Ih,h2 i^t 



-V/,(X,)V/,(X,)^^ + 



ygh2{Xt)Vgh2{Xt 



\Tr 



(2.6) 



where h and h2 are two selected bandwidths. In the next section, we investigate the 
asymptotic properties of Ih,h2{Xt) displayed in (2.6). 



2.2 Asymptotic consistency of Ih^h2{^t) 

The following theorem gives the asymptotic consistency of Ih,h2{Xt) under some general 
conditions. 



Theorem 2.1. Assume that f G C^(]R'^) and g G C^(]R'^) in model (L^ ). Let us denote 
the convergence in probability, fii = J u\K{u)du and Ji = J u\K'^{u)du where K 



IS 



the kernel function already introduced in (2.3) and u is a vector of coordinates (Mi)ie[i,...,d] ■ 
Assume moreover that : 

• < fii < +00 and < J; < +oo for all I E N. 

• is a multivariate i.i.d. sequence with a marginal density noted dx such 
as dxiXt) > and dx G C\R'^). 

• /i — )■ 0, /i2 — 0, kh'^ +CX), kh2 +oo, kh'^^'^ — )• +oo and khp''^ — +oo when 
k — i- +00. 

Under these general conditions, we have 

Ih,h2{Xt)pq > I{Xt)pq 

where Ih,h2{Xt)pq and I{Xt)pq denote the coefficients on the p^^ row and q^^ column of the 
matrix I h^h^{Xt) and I{Xt) respectively. 



A proof of Theorem |2.1| is given in the Appendix. It has to be noted that the as- 
sumption of i.i.d. nearest neighbors can be relaxed as in Lu (1999) (Condition C). In the 
next section, we propose to use a bootstrapping technique to test the local dependence 
on initial conditions for a finite-length time series. 
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3 Testing the local sensitivity to initial conditions 
within time series 



The dynamics following two close delay vectors are considered similar if the symmet- 



ric KuUback-Leiber divergence presented in (1.3) between the densities p{xt+s\XtJ and 
p(xt+s|Xt,) is close to for 2 G [1, • • • , k] and j G [1, • • • , k], where Xj. and Xt. are two 
close delay vectors in the sense of the Euclidean distance. Furthermore, let us recall from 



( |L4| that KL{t,s,5) ^ 6^'I{Xt)6 when \\6\\^ is quite small. 
That is why we consider the following statistic : 

ShMi''^) = exp(-f/h,/i2(t)) 

where 

d d 
q=l p=l 

with h and /i2 two selected bandwidths and d a selected embedding dimension. Under 



the general conditions of Theorem |2.1| and from the continuous mapping theorem, we so 
have 



q=l p=l 

Under the hypothesis that the time series is independent from initial conditions, i.e. 
I{Xt)ij = for all (i, j) G [1, . . . , d]"^, S(t) equals to 1. More the time series will be depen- 
dent on initial conditions, more S{t) will be low. If the process is totally deterministic, 
we have S{t) = 0. In order to test the dependence of a time series on initial conditions, 
the following pair of hypotheses can so be tested : 

Ho : < /3" vs. Hi : "5(t) > /3" 

where (3 corresponds to a level of dependence on initial conditions. A p-value of this 
test can be obtained by estimating the probability F^Sh^h^it) ^ f^)- When /3 is close 
to 1, this p-value corresponds to a probability that the data generating process is not 
independent from initial conditions. On the other hand, when /3 is close to 0, this p-value 
corresponds to a probability that the data generating process is strongly dependent on 
initial conditions (almost totally deterministic). 

However, the asymptotic distribution of Sh,h2{'t) depends on unknown quantities as 



the gradients V/(Xt) and Vg{Xt) (see the proof of Theorem 2.1). Thus Sh,h2{t) cannot 
directly be used to build a test and we consider a re-sampling procedure to provide 
reliable quantiles for testing the local sensitivity of a time series to initial conditions. 
This re-sampling method is inspired from Gengay (1996) who applied a similar method 
to approximate the distribution of a maximal Lyapunov exponent. 

Firtsly / delay vectors are selected with replacement from all the k selected nearest 
neighbors {Xt-}i^[i^,„^k] of Xf. The new selected set, denoted by {X*.}jg[i^.,.^i], allows to 
estimate a new B*j^{t). This estimation allows to find a new 'B*g2 jf^{t) and, finally, a 
new estimation of the Fisher information matrix Ih^h^{Xt) and thus of Sh,h2if)^ denoted 
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respectively by I^f^^^Xt) and Slf^^{t), are obtained. When this experiment is repeated a 
large number of time, it allows to approximate the distribution of Sh,h2(t)- In practice, it 
has to be noted that the k nearest neighbors of Xf needs to be significantly close to Xf 
while the number I of selected delay vectors needs to be choose not too small to obtain 
significant approximations of the sensibility matrix. 

4 Simulations and empirical applications 

In this section, we firstly applied our approach to stationary AR(1) processes displaying 
a constant dependence on initial conditions and to a GARCH(1,1) process where this 
dependence is time-varying. We next test this dependence on the daily returns of the 
5'&P500 index. In all our experiments, we fixed the prediction horizon s = 1. 

4.1 Results on AR(1) processes. 

Firstly, let us consider the following AR(1) process : 

xt,4>i = 0.5 + (pixt-i + aet (4.1) 

where ej is a standard white noise. The Fisher information matrix measuring the sensi- 
tivity to initial conditions of this process is given by 

For our experiments, we fix the variance parameter cr = 0.1 and we consider three station- 
ary AR(1) processes corresponding to the cases (pi = 0.01, 0i = 0.5 and 0i = 0.95. For 
each stationary AR(1) process, 499 time series of size T = 1000 are generated. The values 
of their respective theoretical statistical index S{T) will thus be e~°'°^, e"^^ and e~^^''^^. 
In our experiments, we assume that the time delay r = 1 and the embedding dimension 
d = 2. Estimations of the proposed statistic Sh,h2(T) are done by using the standard 
normal density kernel function in the local polynomial non-parametric regressions and by 
fixing /i = /i2 = 0.1. In our re-sampling procedure (see section [s]), we estimated the prob- 
abilities F{ShM{T) < 0.1) and F{ShM{T) < 0.9) from 199 iterations by choosing I = 20 
delay vectors with replacement from k = 30 nearest neighbours of the current delay vec- 
tor Xt- The empirical cumulative distribution functions of these estimated probabilities, 
denoted by P<^j , are assessed by means of Monte Carlo experiments from the 499 gener- 
ated time series. Figure [T] and Figure |2] display these distribution functions for the three 
stationary AR(1) processes. 

In view of Figure [T] and Figure |2| we observe that 

ni=o.oi(P(5h„h,(T) <P)<a)> %,=o4nShMiT) </?)<«)> ni=o.95(P(^h,h,(T) < /3) < 

for all a G [0,1] and (3 G {0.1,0.9}. This illustrates the lower dependence on initial 
conditions for (pi = 0.01 {S{t) = e~°'°^) than in the case (pi = 0.5 {S{t) = e"^^) which has 
itself a lower dependence on initial conditions than in the case (pi = 0.95 {S{t) = e~^°'2^). 
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It has to be noted that in the cases (pi = 0.5 and = 0.95, the theoretical sta- 
tistical index S{t) is close to 0. However, Figure |2] clearly shows that the probability 
P<^^(P(S'/i^/i2(T) < 0.1) = O) is approximatively equal to for 0i = 0.95 while it is close 
to 0.6 for 01 = 0.5. The finite sample estimation of Sh,h2{t) is thus rather different from 
the theoretical statistical index S{t) in the case of (pi = 0.5. This difference can be due to 
the chosen bandwidths. Indeed, inappropriate hyper-parametrization can cause to give 
importance to delay vectors far from the current delay vector X{T) in our non-parametric 
regressions and finally to conclude spuriously on the independence from initial conditions. 

Figure [i] illustrates how the empirical cumulative distribution function of'P(^Sh,h2{T) < 
0.9) can be changed by making vary the bandwidth h when the others parameters are 
fixed {k = 30, / = 20 and h2 = 0.1) in the case of 0i = 0.5. Figure |4] illustrates the 
same thing for the bandwidth /i2 (k = 30, / = 20 and h = 0.1). In view of Figure [3| 
increasing the bandwidth h allows to conclude on higher dependence on initial conditions 
of the time series. This can be explained by the fact that a too small bandwidth will give 
importance to very few delay vectors what can lead to spurious conclusions. However, 
in view of Figure |4| changing h2 seems to have few impact on the estimation of the 
empirical cumulative functions. It is logic because the AR process has not a time- varying 
residual variance. These observations show that the choice of hyper-parameters in the 
non-parametric regressions must be carefully made. More specifically, final conclusions 
are totally function of the chosen hyper-parametrization. 

- Figure [T] around here - 

- Figure |2] around here - 

- Figure |3] around here - 

- Figure |4] around here - 

4.2 Results on GARCH(1,1) process. 

Let us consider the following G ARCH (1,1) process : 

xt = (Ttet (4.2) 

2 I 2 I /o 2 

CTj = «0 + C(lXt_i + PlCTt-l 

with ao = 5 X lOe — 6, ai = 0.05, (3i = 0.9 and et a standard white noise. This process is 
close to those that can be infer from daily returns of stock market indices. By using an 
inductive reasoning and since \\(3i\\ < 1, this process can be re-written as 

+0O 

2 «0 , V"^ pi 2 
= 1 TT + "1 2^ Pl^t-i 
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Consequently, a coefficient of the Fisher information matrix is given by 



In order to illustrate the time- varying dependence on initial conditions of such a process, a 
time series of size 2000 following |4.2 is generated. We test an hypothesis Hq that the time 



series is "almost" independently distributed by assessing the p- value ^{Sh^h^it) < 1 — e) 
where e is close to 0. In our experiment, we fix e = 5 x 10~^. The theoretical sta tisti cal 
index S{t) and the estimated probability F{Sh,h2 (^) — next computed from 4.3 for 

t e [1000,2000] by using a sliding windows of size 1000. The probability P(^,,,,,2(t) < 1-e) 
is estimated following the previous method used in the case of the AR(1) processes {k = 30, 
I = 20, h = h2 = 0.1). If the probability F{Sh,h2(t) < 1 — e) = 0, the time series can 
be considered similar to an "almost" independently distributed process. Figure [5] display 
the obtained outcomes. When the theoretical statistical index S{t) deviates from 1, 
we remark that the estimated p-value increases what indicates that our method allows 
to get well the moments when the series is more predictable. Although S{t) is never 
equal to 1, the estimated p-value is often close to what means that the time series can 
often be considered as unpredictable from a past window of size 1000 with our hyper- 
parametrization. 

- Figure |5] around here - 



4.3 Empirical applications to the S&lP500 

In this section, we studied if the daily returns of the S&PSOO stock market index are 
sensitive to initial conditions by using our approach. The considered time series goes from 
the 04/01/1999 to the 18/02/2010 and has a size equals to 3051. It has been extracted 
from Datastream. 

In order to investigate the predictability of the time series from a past data window 
of size 1000, we consider a sliding windows of size 1000 and apply our method with the 
same hyper-parameters used in the previous experiments {k = 30,1 = 20, h = h2 = 0.1). 
The p-value F{Sh,h2{'t) < 1 — e) is determined with e = 1 x 10"'' and 1 x 10~^. Results are 
displayed in Figure [6] When e = 1 x 10~^, the p-value varies more between and 1 than 
in the case e = 1 x 10"^ where it is often equal to 0. The time series can so be considered 
predictable by using e = 1 x lO"'^ while it is considered unpredictable with e = 1 x 10~^. 

The dependence on initial conditions is thus low but significant at certain moments. 
More particularly, we remark that the p-value is higher when the volatility of daily returns 
is higher what means that it is more sensitive to initial conditions in period of high volatil- 
ity. Conversely, it is lower sensitive in period of low volatility. Figure [7| illustrates these 
observations by representing the probability F{Sh,h2{'t) < 1 — 1 x 10~^) in function of the 
squared daily returns of the SSzP500. The Pearson's correlation coefficient between these 
quantities is approximatively equals to 0.3 and significantly not null. More significant 
relationships are established by examining the Pearson's correlation coefficients between 
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their respective 5-day moving averages and 20-day moving averages (see Figure [?]). In 
view of Figure [7], these relations are nonhnear. It has to be noted that these observations 
are similar to these done by LeBaron (1992) who showed that there are significant rela- 
tions between volatility and serial correlations in stock market returns, serial correlations 
being a manner to measure the dependence on past conditions. 

- Figure |6] around here - 

- Figure [7] around here - 



5 Conclusion 

In this paper we studied the problem of testing the local sensitivity to initial conditions 
of time series. Our approach consists to measure the distance between two trajectories, 
having different initial conditions and following a same noisy chaotic dynamic, with a sym- 
metric KuUback-Leiber divergence. We showed that this divergence can be characterized 
by a Fisher information matrix. In this way, we showed that autoregressive processes have 
a constant dependence on initial conditions while moving average processes or ARCH{oo) 
processes have a time-varying dependence on initial conditions. Because real-world time 
series have unkown data generating processes, we proposed a framework for testing the 
time- varying sensitivity to initial conditions of any conditionally heteroscedastic nonlinear 
autoregressive processes by using nonparametric regression techniques. More particularly, 
we propose a consistent estimator of the Fisher information matrix characterizing the de- 
pendence on initial conditions. We illustrated these theoretical results through a set of 
numerical experiments. We have remarked that the choice of hyper-parameters in the 
non-parametric regressions must be carefully made. The outcomes obtained on the daily 
returns of the S'&P500 index show that they are more sensitive to initial conditions in 
period of high volatility than in period of low volatility. Interesting further researches 
could be done by investigating the dependence on initial conditions of others time series 
with our method. 
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6 Appendix 

Proof of Proposition [l} We have 



9{Xt) 
Hence, 

I{Xt)= I p{xt+s\Xt)V\og{p{xt+s\Xt))V\og{p{xt+s\Xt)f''dxt+s 

. (Xt+s - fjXt))'^ 1 (Xf+^ - /(Xf))^ 

+^ 45(X,)4 2,(X,)3 )(V5(X,))(V5(X,)) ] 

Because follows a Gaussian distribution centered on f{Xt), we have E^^^^|xJ(xt+s — 

= g{Xt), E,^^^\xA{^t+s - f{Xt) f] = and E,,^^|xJ(xt+, - /{Xt))^] = 3g{Xt)^ what 
give the result for I{Xt). 

Proof of Theorem 12. IL In the next, — ^ denotes the convergence in distribution and 

k—^+oo 

p 

— > denotes the convergence in probability. Let us consider the following theorem : 

k—^+oo 



Theorem 6.1. Assume that f G C^(]R'^) and g G C^(M'^) in model (1.2). Let us denote pi = 
f u\K{u)du and Ji = f u\K'^{u)du where K is the kernel function already introduced in (2.3) 
and u is a vector of coordinates ('Ui)ie[i,...,rf] • Assume moreover that : 

• < fii < +00 and < Ji < +oo for all I € N. 

• {Xti}ie[i,....fc] is a multivariate i.i.d. sequence with a marginal density noted dx such as 
dx{Xt) > and dx G C^{M.'^). 

• /i — )■ 0, kh'^ — )• +00, kh'^^'^ — )• +00 when k — )• +oo. 
Under these general conditions, we have 

VkhHMXt)-f{Xt)-bi{Xt,h)) A Af{0,ai{Xt)) 

k—^+ca 

Vkh^+liVff,{Xt)-VfiXt)-h{Xt,h)) ^ MiO,E2iXt)) 



with : 



/j2 

bi(Xt,h) = o/i_s>o(^^) (^"i^d b2(Xt,h) = s(Xt) + Oh^oi^^) where the i*^ coordinate of 

s{Xt) is 



d^fjXt) , ^ ^^ d^f{Xt) 
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^i(^t) = ^r4^ ^2{Xt) = cj2{Xt)ld with a^iXt) - ^^^'^'^^ 



dx{Xt) fi^dx{Xt 



The proof of Theorem 6.1 is similar to the proofs which can be found in Masry (1996) or Lu 
(1999) and is thus omitted. It has to be noted that the assumption of i.i.d. nearest neighbors 
can be relaxed as in Masry (1996) or Lu (1999). We deduce directly from this Theorem 6.1 that 
each coefficient of the matrix V f f^{Xt)V f\{Xt)^'^ converges in distribution toward the product 
of two normal distributions : 

V(p,g) G [l,...,d]2, v/,,(Xt)pV/J^t)a ^ ZpZ^ 

fe— >' + 00 

where V f fi{Xt)pV f j^{Xt)q denotes the coefficient of the p*^ row and the q^^ column of the matrix 
VfhiXt)Vfh{Xtf' and where follows a Gaussian law MiVfiXt)i + b2{Xt, h)i, j^a2iXt)) 
with Vf{Xt)i and b2{Xt,h)i the Z*'* coordinates of Vf{Xt) and b2{Xt,h) respectively. More 
particularly, we will have the following bias and variance for this estimator 

E[Vf^,{Xt)pVf^,{Xt)g] - Vf{Xt)pVf{Xt)q = Vf{Xt)MXt, h)q + b2{Xu h)pVf{Xt), + b2{Xt, h)pb2{Xt, h)q 

= Oh^o{h) 

n^fHiXt)pVh{X,),] = (3^ + 2^^(V/(X,), + 62(X„/.),)2 ifp = q 

= if p ^ q 

By supposing that /i — ?■ and kh^^'^ — )■ +oo when k +oo, the consistency ofV f i^{Xt)'V f ^{Xt)^^ 
is thus obtained. 

The asymptotic consistency of ghj{Xt) toward 'E[{xt+s — fh{Xt)Y] can also be achieved by 



using the general conditions of Theorem 6.1 if /i2 — )• and kh2 — )• +oo when k — )• +oo. This 
outcomes is obtained by replacing / with E[(xt-|-s — fh{Xt))'^\ and g{Xt) with a constant function 
in the statement of Theorem 16. II 

Because fh{Xt) is a consistent estimator of / if /i — )• and kh'^ — )• +oo when k — )• +oo, we 

have E[(xt+s — fhiXt))^] V[xt+s — f{Xt)] = g{Xt) under these conditions. Hence, 

g^^iXt) g{Xt) 

fe— >-+oo 

if /i — )• 0, /i2 — 0, kh'^ — ;> +00 and kh2 — )• +oo. 

Similarly to Vff^{Xt)pVfh{Xt)g, the asymptotic consistency of V gh^{Xt)pV gh^{Xt)q can 
also be achieved by assuming ^2 — and khf^"^ — t- +00 when k — t- +00 : 

Vgf,^{Xt)pVgh^{Xt)g ^ A VE[{xt+s - MXt))\VE[{xt+s - fh{Xt))% 

If fh{Xt) is a consistent estimator of /, we get the asymptotic consistency oiV g^^{Xt)pV gi^^{Xt)q 
toward Vg{Xt)pVg{Xt)g. 



Finally, from Theorem 6.1 and if we suppose /i — )• 0, /i2 — 0, kh'^ — )• +00, kh2 — )• +00, 
_^ g^j^^^ khf^'^ — )• +00 when k — )• +00, we have 

IhM(Xt)pq > I{Xt)pq 

fe— >'+00 

where I{Xt)pq denotes the coefficient on the p*^ row and q^^ column of the matrix I{Xt). 
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Figure 1: Empirical cumulative distribution functions of the estimated probabilities 
¥{Sh,h2{T) < 0.9) for the three AR(1) processes. 




Figure 2: Empirical cumulative distribution functions of the estimated probabilities 
F{Sh,h2{T) < 0.1) for the three AR(1) processes. 
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Figure 3: Empirical cumulative distribution functions of the estimated probabilities 
^{ShMC^) — 0-9) ™ ^1 — 0-^ when h varies. 




Figure 4: Empirical cumulative distribution functions of the estimated probabilities 
^{ShMC^) — 0-9) in the case (pi — 0.5 when /i2 varies. 
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GARCH(1,1) process 
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Figure 5: At the top : The simulated GARCH(1,1) process. At the Middle : The cor- 
responding time- varying theoretical statistical index S{t). At the bottom : The corre- 
sponding time- varying probability F{Sh,h2{'t) < 1 — e) with e = 5 x 10~^. 
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S&P 500 
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Figure 6: At the top : Daily returns of the 5'&P500 from the 25/12/2002 to the 
18/02/2010. At the Middle : The corresponding time- varying probability ¥{Sh,h2{t) < 
1 — 1 X 10""^). At the bottom : The corresponding time- varying probability F{Sh,h2{t) < 
1 - 1 X 10-3). 
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Raw data (Pearson Coeff. = 0.298, 95% Cl=[0.258;0.337] ) 
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Figure 7: At the top : F{Sh,h2{t) < 1 — 1 x 10 ^) in function of the squared daily returns 
of the SkP500 from the 25/12/2002 to the 18/02/2010. At the Middle : 5 day-Moving 
Average of F{Sh,h2{t) < 1 — 1 x 10~^) in function of the 5 day- Moving Average of the 
squared daily returns of the SLP500 from the 25/12/2002 to the 18/02/2010. At the 
bottom : 20 day-Moving Average of F^Sh^hiit) < 1 - 1 x 10"^) in function of the 20 
day-Moving Average of the squared daily returns of the 5'&P500 from the 25/12/2002 to 
the 18/02/2010. 



24 



